Improving self-organization of document collections by semantic mapping

نویسندگان

  • Renato Fernandes Corrêa
  • Teresa Bernarda Ludermir
چکیده

In text management tasks, the dimensionality reduction becomes necessary to computation and interpretability of the results generated by machine learning algorithms. This paper describes a feature extraction method called semantic mapping. Semantic mapping, sparse random mapping and PCA are applied to self-organization of document collections using self-organizing map (SOM). The behaviors of the methods on projection of binary and tfidf document vector representations are compared. The classification error generated by SOM maps on text categorization of the K1 collection was used to compare the performance of the methods. Semantic mapping generated better document representation than sparse random mapping. r 2006 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Self-organization of Very Large Document Collections: State of the Art

The Self-Organizing Map (SOM) forms a nonlinear projection from a high-dimensional data manifold onto a low-dimensional grid. A representative model of some subset of data is associated with each grid point. The SOM algorithm computes an optimal collection of models that approximates the data in the sense of some error criterion and also takes into account the similarity relations of the models...

متن کامل

Mapping discursive dynamics of the financial crisis: a structural perspective of concept roles in semantic networks

Background/purpose: Convenient access to vast and untapped collections of documents generated by organizations is a highly valuable resource for research. These documents (e.g., press releases) are a window into organizational strategies, communication patterns, and organizational behavior. However, the analysis of large document corpora requires appropriate automated methods for text mining an...

متن کامل

Improving CNG Fuel Use in Transportation Sector: Strategic Option Development Approach

Energy is one of the main pillars of the economic cycle. The environmental pollution caused by the consumption of gasoline and diesel fuel, the problems and limitations of supplying and supplying fuel within the country, moving towards self-sufficiency in the supply of gasoline and diesel fuel, reducing government spending against income and also allocating macro subsidies to keep their prices...

متن کامل

Improving CNG Fuel Use in Transportation Sector: Strategic Option Development Approach

Energy is one of the main pillars of the economic cycle. The environmental pollution caused by the consumption of gasoline and diesel fuel, the problems and limitations of supplying and supplying fuel within the country, moving towards self-sufficiency in the supply of gasoline and diesel fuel, reducing government spending against income and also allocating macro subsidies to keep their prices...

متن کامل

Exploration of Text Collections with Hierarchical Feature

Document classiication is one of the central issues in information retrieval research. The aim is to uncover similarities between text documents. In other words, classiication techniques are used to gain insight in the structure of the various data items contained in the text archive. In this paper we show the results from using a hierarchy of self-organizing maps to perform the text classiicat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neurocomputing

دوره 70  شماره 

صفحات  -

تاریخ انتشار 2006